AITopics | asymptotic convergence rate

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

arXiv.org Machine LearningMay-15-2026

Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient descent for minimizing $C^2$-functions that satisfy the Polyak-Lojasiewicz (PL) inequality and under a multiplicative gradient noise model motivated by overparameterized neural networks. Using a geometric interpretation of the PL-condition, we prove a simple yet surprising fact: in this possibly non-convex setting, the asymptotic convergence rate of (S)GD matches the rate obtained for strongly convex quadratics.

artificial intelligence, machine learning, projn, (17 more...)

arXiv.org Machine Learning

2605.14663

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Neural Information Processing SystemsMar-17-2026, 00:03:24 GMT

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. However, existing understandings of convergence rate of CR are conditioned on special types of geometrical properties of the objective function. In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KL property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KL property.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Stochastic Expectation Maximization with Variance Reduction

Neural Information Processing SystemsMar-16-2026, 23:31:35 GMT

Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune. In this paper, we propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. We show that sEM-vr has the same exponential asymptotic convergence rate as batch EM. Moreover, sEM-vr only requires a constant step size to achieve this rate, which alleviates the burden of parameter tuning. We compare sEM-vr with batch EM, sEM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and sEM-vr converges significantly faster than these baselines.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Yi Zhou, Zhe Wang, Yingbin Liang

Neural Information Processing SystemsFeb-14-2026, 06:21:23 GMT

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees

artificial intelligence, convergence rate, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

aba22f748b1a6dff75bda4fd1ee9fe07-Paper.pdf

Neural Information Processing SystemsFeb-14-2026, 02:45:07 GMT

algorithm, step size, variance, (15 more...)

Neural Information Processing Systems

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Neural Information Processing SystemsNov-20-2025, 22:52:10 GMT

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. However, existing understandings of convergence rate of CR are conditioned on special types of geometrical properties of the objective function. In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KL property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KL property.

asymptotic convergence rate, cubic regularization, nonconvex optimization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Stochastic Expectation Maximization with Variance Reduction

Neural Information Processing SystemsNov-20-2025, 22:47:44 GMT

Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step. Stochastic Expectation Maximization (sEM) reduces the cost of E-step by stochastic approximation. However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune. In this paper, we propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. We show that sEM-vr has the same exponential asymptotic convergence rate as batch EM. Moreover, sEM-vr only requires a constant step size to achieve this rate, which alleviates the burden of parameter tuning. We compare sEM-vr with batch EM, sEM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and sEM-vr converges significantly faster than these baselines.

name change, stochastic expectation maximization, variance reduction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Yi Zhou, Zhe Wang, Yingbin Liang

Neural Information Processing SystemsNov-20-2025, 19:27:36 GMT

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees

artificial intelligence, convergence rate, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Ruppert-Polyak averaging for Stochastic Order Oracle

Smirnov, V. N., Kazistova, K. M., Sudakov, I. A., Leplat, V., Gasnikov, A. V., Lobanov, A. V.

arXiv.org Artificial IntelligenceNov-24-2024

Black-box optimization, a rapidly growing field, faces challenges due to limited knowledge of the objective function's internal mechanisms. One promising approach to address this is the Stochastic Order Oracle Concept. This concept, similar to other Order Oracle Concepts, relies solely on relative comparisons of function values without requiring access to the exact values. This paper presents a novel, improved estimation of the covariance matrix for the asymptotic convergence of the Stochastic Order Oracle Concept. Our work surpasses existing research in this domain by offering a more accurate estimation of asymptotic convergence rate. Finally, numerical experiments validate our theoretical findings, providing strong empirical support for our proposed approach.

artificial intelligence, machine learning, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2411.15866

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Asymptotic Convergence Rate of Alternating Minimization for Rank One Matrix Completion

Liu, Rui, Olshevsky, Alex

arXiv.org Machine LearningAug-11-2020

We study alternating minimization for matrix completion in the simplest possible setting: completing a rank-one matrix from a revealed subset of the entries. We bound the asymptotic convergence rate by the variational characterization of eigenvalues of a reversible consensus problem. This leads to a polynomial upper bound on the asymptotic rate in terms of number of nodes as well as the largest degree of the graph of revealed entries.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2008.04988

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

asymptotic convergence rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Stochastic Expectation Maximization with Variance Reduction

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

aba22f748b1a6dff75bda4fd1ee9fe07-Paper.pdf

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Stochastic Expectation Maximization with Variance Reduction

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Ruppert-Polyak averaging for Stochastic Order Oracle

Asymptotic Convergence Rate of Alternating Minimization for Rank One Matrix Completion